Agent Tools Developer Guide

This guide covers how to add a new agent tool, how artifacts are laid out on disk, and how caching works for agent runs.

Operational Model

Annolid agent operations are split into two layers:

  • Self-improving: skills and memory evolve behavior without replacing installed code.

  • Self-updating: signed update workflow stages and applies software updates with rollback plans.

Self-improving

  • Skills: loaded with precedence workspace -> managed (~/.annolid/skills) -> bundled.

  • Hot reload: controlled by skills.load.watch and skills.load.pollSeconds.

  • Skill manifest validation: frontmatter is validated at load time; invalid manifests are marked unavailable.

  • Workspace memory: daily notes in memory/YYYY-MM-DD.md and curated long-term notes in memory/MEMORY.md.

  • Pre-compaction flush: transcript snapshot can be appended before compaction via memory flush helpers.

  • Memory retrieval plugin: default is local semantic ranking with keyword fallback (workspace_semantic_keyword_v1).

Self-updating

  • Channel-aware update manager supports stable, beta, and dev.

  • Pipeline: preflight -> stage -> verify -> apply -> restart marker -> post-check.

  • Rollback: rollback plan is generated for each run and executed on apply/post-check failures.

  • Canary policy: rollout can enforce rollback thresholds using sample count, failure-rate, and regression limits.

  • Safe update service: supports manifest check, artifact staging/download, checksum verification, signature verification, and transaction reporting.

  • Auto-update: disabled by default; configurable interval+jitter schedule when enabled (ANNOLID_AUTO_UPDATE_* env settings).

  • GUI controls: AI Model Settings -> Agent Runtime includes auto-update enable/channel/check-now/rollback and bot settings for skill hot reload, memory mode, and skill source locations.

  • Production safety policy: in production mode (ANNOLID_PRODUCTION_MODE=1 or ANNOLID_ENV=production), signed update manifests and signed non-builtin skills are required.

How to add a tool

  1. Define the tool by extending the base class in annolid/core/agent/tools/base.py:

    • Implement run(self, ctx, payload) with your core logic.

    • Use ctx.results_dir and ctx.run_id to derive stable outputs.

    • Use ctx.artifact_store if you want to persist artifacts and participate in caching.

  2. Register the tool in the registry:

    • Add a new tool wrapper in annolid/core/agent/tools/.

    • Export it from annolid/core/agent/tools/__init__.py.

    • Register it with ToolRegistry (see annolid/core/agent/tools/registry.py).

  3. Integrate with the runner (Phase 4+):

    • Compose tools using the registry and a pipeline definition.

    • Ensure inputs/outputs follow the unified data models in base.py.

  4. Write a minimal test:

    • Use tiny inputs and validate outputs.

    • Prefer tests under tests/ that don’t require large external models.

Artifact layout

Artifacts are stored per video results directory and organized as:

  • <results_dir>/

    • agent.ndjson (default agent output)

    • <video_name>_000000000.json + per-frame LabelMe JSON

    • .agent_runs/<run_id>/ (run-scoped artifacts)

    • .cache/agent_cache.json (cache metadata for re-run reuse)

The FileArtifactStore resolves paths relative to:

  • Run artifacts: .agent_runs/<run_id>/...

  • Cache artifacts: .cache/...

See annolid/core/agent/tools/artifacts.py for helpers.

Caching semantics

Agent runs compute a content hash from:

  • video path + filesystem stats (size/mtime),

  • behavior spec (full schema),

  • run config (stride, max frames, etc.),

  • model identifiers,

  • output NDJSON name.

If the cache hash matches and both the NDJSON and annotation store exist, the service returns cached results without re-running the agent.

To disable reuse from the CLI, run:

annolid-run agent --no-cache ...

Citation management tools

Annolid includes built-in BibTeX tooling for paper citation workflows:

  • CLI:

    • annolid-run citations-list --bib-file refs.bib [--query ...]

    • annolid-run citations-upsert --bib-file refs.bib --key mykey --title ... --author ... --year ...

    • annolid-run citations-remove --bib-file refs.bib --key mykey

    • annolid-run citations-format --bib-file refs.bib

  • Agent function tools:

    • bibtex_list_entries

    • bibtex_upsert_entry

    • bibtex_remove_entry

    • gui_save_citation (save from active PDF/web viewer context)

Examples in Annolid Bot message input:

  • save citation

  • list citations

  • list citations from references.bib for annolid

  • save citation from pdf as annolid2024 to references.bib

  • save citation from web

  • add citation @article{yang2024annolid, title={Annolid: Annotate, Segment, and Track Anything You Need}, author={Yang, Chen and Cleland, Thomas A}, journal={arXiv preprint arXiv:2403.18690}, year={2024}}

  • save citation from web with strict validation

  • save citation from pdf without validation

  • open threejs example two mice

  • open threejs example brain

  • open threejs html /tmp/annolid_threejs_examples/two_mice.html

  • open threejs https://example.org/viewer.html

Default behavior:

  • save citation first attempts Google Scholar BibTeX lookup from the active paper context, then falls back to Crossref/OpenAlex when needed, and saves the merged entry to .bib.

GUI workflow:

  • In Annolid Bot input toolbar, click 📚 to open the citation manager.

  • Manage a .bib file, save citations from active PDF/web context, choose auto-validation or strict mode, view/edit a Source column (URL or PDF path), edit rows inline with year/DOI checks, and remove selected entries.

See also: docs/source/citations_tutorial.md for a full user tutorial.

Operator Commands

Use annolid-run commands for routine operations:

  • annolid-run agent skills refresh [--workspace <path>]

  • annolid-run agent skills inspect [--workspace <path>]

  • annolid-run agent memory flush [--workspace <path>] [--session-id <id>] [--note <text>]

  • annolid-run agent memory inspect [--workspace <path>]

  • annolid-run agent eval run --traces <jsonl> --candidate-responses <jsonl> --out <report.json>

  • annolid-run agent eval build-regression --workspace <path> --out <traces.jsonl> [--min-abs-rating 1]

  • annolid-run agent eval gate --changed-files <files.txt> --report <report.json> [--max-regressions 0] [--min-pass-rate 0.0]

  • annolid-run agent feedback add --workspace <path> --rating -1|0|1 [--trace-id <id>] [--comment <text>] [--expected-substring <text>]

  • annolid-run update check --channel stable|beta|dev [--require-signature]

  • annolid-run update run --channel stable|beta|dev [--execute] [--require-signature] [--skip-post-check] [--canary-metrics <json>]

  • annolid-run update rollback --install-mode package|source --previous-version <X.Y.Z> [--execute]

Admin Function APIs

The agent runtime also exposes operator-style function tools:

  • skills.refresh

  • memory.flush

  • eval.run

  • update.run

    • update.run requires explicit operator consent phrase for execute=true: APPROVE_ANNOLID_CORE_UPDATE (override with ANNOLID_OPERATOR_UPDATE_CONSENT_PHRASE).

Shell Session Tools

For OpenClaw-style shell lifecycle workflows, Annolid now provides session tools:

  • exec_start(command, working_dir?, background?, timeout_s?, pty?)

  • exec_process(action, session_id?, wait_ms?, tail_lines?, text?, submit?)

Supported exec_process.action values:

  • list, poll, log, write, submit, kill

Notes:

  • pty is accepted but currently not enabled (pty_supported=false in responses).

  • Basic dangerous command patterns are blocked at start time.

  • Runtime policy group group:runtime now includes exec, exec_start, and exec_process.

Improvement Quality Loop

  • Anonymized run traces: workspace/eval/run_traces.ndjson captures hashed session/channel/chat IDs and redacted text previews.

  • Explicit user feedback: workspace/eval/feedback.ndjson stores rating/comment/optional expected substring for promotion signals.

  • Regression dataset build: combines traces + feedback into eval traces for CI and pre-promotion checks.

  • Shadow mode: enable ANNOLID_AGENT_SHADOW_MODE=1 to log alternative routing decisions to workspace/eval/shadow_routing.ndjson. use annolid-run agent skills shadow --candidate-pack <dir> to compare candidate skill packs before promotion.

Governance and Audit

Governance events are stored as NDJSON with default path:

  • ~/.annolid/governance/events.ndjson

You can override it with:

  • ANNOLID_GOVERNANCE_EVENTS_PATH=/custom/path/events.ndjson

Audited event categories include skill snapshot/refresh changes, memory writes/flushes, update stage/run actions, and rollback outcomes.

Three.js bot tools

Annolid Bot supports direct Three.js viewer control in GUI sessions.

  • Function tools:

    • gui_open_threejs(path_or_url)

    • gui_open_threejs_example(example_id)

  • Built-in example IDs:

    • two_mice_html (default)

    • brain_viewer_html

    • helix_points_csv

    • wave_surface_obj

    • sphere_points_ply

The bot recognizes natural-language commands such as open threejs example ....

Browser Automation Safety

Annolid supports MCP browser automation with both granular tools and a unified tool:

  • mcp_browser (single control surface with actions: status|start|stop|navigate|snapshot|screenshot|act|wait)

  • mcp_browser_navigate, mcp_browser_click, mcp_browser_type, etc.

Navigation hardening:

  • browser navigation allows http://, https://, and about:blank only.

  • unsafe schemes such as file://, javascript:, and data: are blocked.

  • GUI open_url also blocks file://; use an explicit local file path instead.

Annolid code/docs Q&A and tutorials

Annolid Bot is optimized to answer Annolid-specific questions from local docs and code context.

  • It can explain modules, workflows, and settings with file-path references.

  • It can generate on-demand tutorials for requested topics and levels using the active chat model, grounded by Annolid docs/code evidence.

  • When a tutorial is saved to Markdown, Annolid Bot auto-opens the generated .md in the embedded web viewer.

  • Direct command examples:

    • create on demand tutorial for realtime camera setup in annolid

    • create beginner tutorial for behavior analysis and save to markdown file

    • how do i use annolid for behavior analysis

Realtime camera snapshot + email

Annolid Bot can capture a snapshot from a camera stream and send it by email.

  • Stream snapshot:

    • GUI sessions: use gui_check_stream_source with save_snapshot=true.

    • This GUI tool now runs a full camera mission pipeline:

      • probe -> capture -> annotate -> notify/email

      • returns explicit camera_mission.steps and delivery status objects.

    • Non-GUI channels (for example email/IM): use camera_snapshot.

    • Snapshot files are saved under .annolid/workspace/camera_snapshots/.

    • Outlook Safe Links camera URLs are automatically unwrapped to the original stream URL.

    • Source fallback policy is intent-aware:

      • eye-blink intent defaults to camera 0

      • network camera intent prefers remembered network streams.

  • Email with attachments:

    • Use the email tool with:

      • to

      • subject

      • content

      • optional attachment_paths (list of local file paths)

Example bot intent:

  • check wireless camera, save a snapshot, and email it to user@example.com

Realtime email/report spam control:

  • Realtime bot report interval controls report cadence.

  • Email requests use an additional minimum interval (bot_email_min_interval_sec, default 60s) to avoid repeated email requests.

Security and policy hardening (Phase 2)

Adds stricter defaults for tool access and data handling:

  • Capability-oriented tool profiles:

    • gui, email, realtime, filesystem

    • explicit capability expressions are supported, for example:

      • capability:gui,email

      • capability:gui+realtime

  • Snapshot path hardening:

    • camera_snapshot writes only under workspace camera_snapshots/.

    • symlink escape paths are rejected.

  • Redaction-at-source:

    • private/local stream endpoints are redacted in outbound content.

    • sensitive metadata keys (for example peer_id, account_id) are redacted before publish.

  • Runtime high-risk guard:

    • deny-by-default blocks risky multi-tool chains unless explicit intent is provided.

    • config toggle: agents.defaults.strict_runtime_tool_guard (default true).

Example config:

{
  "agents": {
    "defaults": {
      "strict_runtime_tool_guard": true
    }
  }
}

Explicit high-risk intent markers supported by policy/runtime guards:

  • intent:high-risk

  • intent:high_risk

  • allow:high-risk

  • allow_high_risk

  • unsafe:high-risk

Session memory and replay

Annolid agent sessions now keep separated memory layers and replayable event logs.

  • Working memory:

    • short-horizon session summary derived from recent user/assistant turns.

    • stored in session metadata as working_memory.

    • bounded by a character quota in PersistentSessionStore.

  • Long-term memory:

    • stable facts/notes derived from session facts and consolidation updates.

    • stored in session metadata as long_term_memory.

    • bounded by a character quota in PersistentSessionStore.

Deterministic consolidation and telemetry

Memory consolidation now uses deterministic triggers based on:

  • session turn counter (turn_counter)

  • next scheduled consolidation turn (next_consolidation_turn)

  • history length relative to memory window

Telemetry is persisted in session metadata as memory_telemetry with entries like:

  • timestamp

  • outcome (for example llm_consolidated, skipped_short_transcript, not_due)

  • history_len, archive_len, keep_len

  • elapsed_ms

Memory mutation audit trail

Session metadata contains memory_audit_trail entries for memory changes, including:

  • timestamp

  • scope (facts, working_memory, long_term_memory)

  • mutation (for example set_fact, set_working_memory)

  • reason

  • turn_id

  • before_chars / after_chars

Safe replay for debugging

Session event records are stored in metadata key event_log.

  • Each entry includes:

    • timestamp

    • direction (inbound/outbound)

    • kind (for example user, assistant, progress, final)

    • optional turn_id, event_id, idempotency_key

    • payload

GUI/backend helpers:

  • replay_session_debug_events(session_store=..., session_id=..., direction=\"\", limit=200)

  • format_replay_as_text(events)

These helpers are implemented in:

  • annolid/core/agent/gui_backend/session_io.py